NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

History-Independent Concurrent Hash Tables

https://doi.org/10.1145/3717823.3718283

Attiya, Hagit; Bender, Michael A; Farach-Colton, Martín; Oshman, Rotem; Schiller, Noa (June 2025, ACM)

Free, publicly-accessible full text available June 15, 2026
History-Independent Dynamic Partitioning: Operation-Order Privacy in Ordered Data Structures

https://doi.org/10.1145/3733620.3733625

Bender, Michael A; Farach-Colton, Martín; Goodrich, Michael T; Komlós, Hanna (April 2025, ACM SIGMOD Record)

A data structure is history independent if its internal representation reveals nothing about the history of operations beyond what can be determined from the current contents of the data structure. History independence is typically viewed as a security or privacy guarantee, with the intent being to minimize risks incurred by a security breach or audit. Despite widespread advances in history independence, there is an important data-structural primitive that previous work has been unable to replace with an equivalent history-independent alternative—dynamic partitioning. In dynamic partitioning, we are given a dynamic set S of ordered elements and a size-parameter B, and the objective is to maintain a partition of S into ordered groups, each of size θ(B). Dynamic partitioning is important throughout computer science, with applications to B-tree rebalancing, write-optimized dictionaries, log-structured merge trees, other external-memory indexes, geometric and spatial data structures, cache-oblivious data structures, and order-maintenance data structures. The lack of a historyindependent dynamic-partitioning primitive has meant that designers of history-independent data structures have had to resort to complex alternatives. In this paper, we achieve history-independent dynamic partitioning. Our algorithm runs asymptotically optimally against an oblivious adversary, processing each insert/delete with O(1) operations in expectation and O(B logN/ loglogN) with high probability in set size N.
more » « less
Free, publicly-accessible full text available April 28, 2026
Optimal Bounds for Open Addressing Without Reordering

https://doi.org/10.1109/FOCS61266.2024.00045

Farach-Colton, Martín; Krapivin, Andrew; Kuszmaul, William (October 2024, IEEE)

Full Text Available
The Case for External Graph Sketching

https://doi.org/10.1137/1.9781611978759.9

Bender, Michael A; Farach-Colton, Martín; Jacob, Riko; Komlós, Hanna; Tench, David; West, Evan T (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
Tiny Pointers

https://doi.org/10.1145/3700594

Bender, Michael A; Conway, Alex; Farach-Colton, Martín; Kuszmaul, William; Tagliavini, Guido (October 2024, ACM Transactions on Algorithms)

This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional\(\log n\)-bit pointers can be replaced with\(o(\log n)\)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an item in an array filled to load factor\(1-\delta\), then the optimal tiny-pointer size is\(\Theta(\log\log\log n+\log\delta^{-1})\)bits in the fixed-size case, and\(\Theta(\log\delta^{-1})\)expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we apply tiny pointers to five classic data-structure problems. We show that:A data structure storing\(n\)\(v\)-bit values for\(n\)keys with constant-factor time modifications/queries can be implemented to take space\(nv+O(n\log^{(r)}n)\)bits, for any constant\(r\gt0\), as long as the user stores a tiny pointer of expected size\(O(1)\)with each key—here,\(\log^{(r)}n\)is the\(r\)-th iterated logarithm.Any binary search tree can be made succinct, meaning that it achieves\((1+o(1))\)times the optimal space, with constant-factor time overhead, and can even be made to be within\(O(n)\)bits of optimal if we allow for\(O(\log^{*}n)\)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree.Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-factor time overhead and\((1+o(1))\)-factor space overhead.Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-factor time overhead and with an additional space consumption of\(\log^{(r)}n+O(\log j)\)bits per\(j\)-bit value for an arbitrary constant\(r\gt0\)of our choice.Given an external-memory array\(A\)of size\((1+\varepsilon)n\)containing a dynamic set of up to\(n\)key-value pairs, it is possible to maintain an internal-memory stash of size\(O(n\log\varepsilon^{-1})\)bits so that the location of any key-value pair in\(A\)can be computed in constant time (and with no IOs). In each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free.
more » « less
Full Text Available
Exploring the Landscape of Distributed Graph Sketching

https://doi.org/10.1137/1.9781611978339.11

Tench, David; West, Evan T; Zhang, Kenny; Bender, Michael A; DeLayo, Daniel; Farach-Colton, Martín; Gill, Gilvir; Seip, Tyler; Zhang, Victor (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
Nearly Optimal List Labeling

https://doi.org/10.1109/FOCS61266.2024.00132

Bender, Michael A; Conway, Alex; Farach-Colton, Martín; Komlós, Hanna; Koucký, Michal; Kuszmaul, William; Saks, Michael (October 2024, IEEE)

Full Text Available
Online List Labeling: Breaking the \({\log^2 n}\) Barrier

https://doi.org/10.1137/22M1534468

Bender, Michael A.; Conway, Alex; Farach-Colton, Martín; Komlós, Hanna; Kuszmaul, William; Wein, Nicole (October 2024, SIAM Journal on Computing)

Full Text Available
Beyond Bloom: A Tutorial on Future Feature-Rich Filters

https://doi.org/10.1145/3626246.3654681

Pandey, Prashant; Farach-Colton, Martín; Dayan, Niv; Zhang, Huanchen (June 2024, ACM)

Full Text Available
GraphZeppelin : How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)

https://doi.org/10.1145/3643846

Tench, David; West, Evan; Zhang, Victor; Bender, Michael A; Chowdhury, Abiyaz; Delayo, Daniel; Dellas, J Ahmed; Farach-Colton, Martín; Seip, Tyler; Zhang, Kenny (September 2024, ACM Transactions on Database Systems)

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is an inherent limitation of this approach and is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we callGraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph.GraphZeppelinis optimized for massive dense graphs:GraphZeppelincan process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a resultGraphZeppelinvastly increases the scale of graphs that can be processed.
more » « less
Full Text Available

« Prev Next »

Search for: All records